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Abstract 

In this paper, by introducing a new user similarity index base on the diffusion 
process, we propose a modified collaborative filtering (MCF) algorithm, which has 
remarkably higher accuracy than the standard collaborative filtering. In the pro- 
posed algorithm, the degree correlation between users and objects is taken into 
account and embedded into the similarity index by a tunable parameter. The nu- 
merical simulation on a benchmark data set shows that the algorithmic accuracy of 
the MCF, measured by the average ranking score, is further improved by 18.19% 
in the optimal case. In addition, two significant criteria of algorithmic performance, 
diversity and popularity, are also taken into account. Numerical results show that 
the presented algorithm can provide more diverse and less popular recommenda- 
toins, for example, when the recommendation list contains 10 objects, the diversity, 
measured by the hamming distance, is improved by 21.90%. 
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1 Introduction 



With the expansion of Internet [1] and widely applications of Web 2. 0, how to 
efficiently help people obtain information that they truly need is a challenging 
task nowadays [2]. Recommender systems have become an effective tool to 
address the information overload problem by predicting the user's interests 
and habits based on their historical selections or collections, which have been 
used to recommend books and CDs at Amazon.com, movies at Netflix.com, 
and news at Versifi Technologies (formerly AdaptiveInfo.com) [3]. Motivated 
by the practical significance to the e-commerce and society, study of recom- 
mender systems has caught increasing attentions and become an essential 
issue in Internet applications such as e-commerce systems and digital library 
systems [4]. A personalized recommender system includes three parts: data 
collection, model analysis and the recommendation algorithm, among which 
the algorithm is the core part. Various kinds of algorithms have been proposed 
thus far, including collaborative filtering approaches [5,6,7,8,9], content-based 
analyses [10,11], hybrid algorithms [12,13,14], and so on. For a review of cur- 
rent progress, see Refs. [3,15] and the references therein. 

One of the most successful recommendation algorithms, called collaborative 
filtering (CF), has been developed and extensively investigated over the past 
decade [5,6,16]. When predicting the potential interests of a given user, CF 
algorithm firstly identifies a set of similar users from the past records and then 
makes predictions based on the weighted combination of those similar users' 
opinions. Despite its wide applications, CF algorithm suffers from several ma- 
jor limitations including system scalability and accuracy [17]. Therefore, the 
current CF algorithms still require further improvements to make recommen- 
dations more effective. Recently, some physical dynamics, including mass dif- 
fusion [18,19] and heat conduction [20], have found their applications in per- 
sonalized recommendations. Liu et al. [7] introduced the mass diffusion process 
to compute the user similarity of CF, and found that the modified algorithm 
has remarkably higher accuracy than the standard CF. In this method, all of 
the objects and users with far different degrees have been treated equally, in 
other words, the degree correlations between objects and users are neglected. 
For example, suppose a user with small degree has collected a small-degree 
object, the edge connecting them represents a very special taste of the user, 
while the information contained in the edges connecting an active user and 
a popular object is less meaningful. Therefore, we argue that the user sim- 
ilarity index could be improved by considering the degree correlation of the 
user-object bipartite network. The numerical results show that the improved 
index that depresses the influence of mainstream preferences can provide more 
accurate and more diverse recommendations. 
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2 Method 



Suppose there are m objects and n users in a recommender system. Denote 
the object set as O = {oi, 02, • • • , o m } and the user set as U = {ui, 112, • • • , 
«„}, a recommender system can be fully described by an adjacent matrix 
A = {ctij} G R m ' n , where — 1 if Oj is collected by Uj, and = otherwise. 
In the standard CF, the user or object similarities are calculated firstly, then 
the predictions are computed accordingly. If has not yet collected Oj (i.e., 
ciji = 0), the predicted score, is given as 



En 
*=i Sua i l m 

2^1=1 s li 



where su is the similarity between user u\ and U{. The widest used similarity 
indices are the Sorenson Index [?] and the Salton Index [?], however, they only 
rely on the users' degrees and the number of common collected objects, without 
consideration of the influence of degree correlation between users and objects. 
Inspired by the mass diffusion process proposed by Zhou et al. [19], Liu et al. 
[7] proposed a modified CF to improve the algorithmic accuracy by using the 
mass diffusion process to compute the user similarities, and they found that 
the diversity of recommendations is also enhanced. Although this algorithm 
has improved the standard CF, however, the degree correlation between users 
and objects has not been considered, thus every edge has the same contribution 
to the diffusion process. If both of ui and Uj have selected an object o;, they 
probably have similar tastes or interests. Provided the degree of o\ is very 
large (object 01 is very popular), this taste (the favor for oi) is ordinary and 
it does not mean Ui and Uj are very similar. Therefore, its contribution to 
Sij should be weaken. On the other hand, provided that a user Ui with small 
degree has collected an unpopular object o\ (the degree of o\ is very small), 
this taste should be very special, the contribution of the edge connecting U; L 
and 0/ should be enlarged. It is not very meaningful if a user with large degree 
has selected a popular object, while if an unpopular object is selected by a 
small-degree user, this edge would contain rich information on personalized 
preference. Accordingly, the contribution of the edge connecting and o\ 
should be negatively correlated with k(ui)k(oi). We assume a certain amount 
of resource (e.g. recommendation power) is associated with each user, and the 
weight represents the proportion of the resource Uj would like to distribute 
to Ui. Following a network-based resource-allocation process where each user 
distributes his/her initial resource to all the objects he/she has collected, and 
then each object sends back what it has received to all the users who collected 
it, considering the correlation between users and objects, the weight (the 
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fraction of initial resource Uj eventually gives to Ui) can be expressed as 
1 0'ii{k U jko l )^ ■ o,ij{k Ui k 0l )^ 



where A is a tunable parameter controlling the effect of degree correlation. 
Based on the above definition, given a target user the algorithm is given 
as following: (i) Calculating the user similarity matrix {%} G R n,n based on 
the diffusion process, as shown in Eq. (2); (ii) For each user U{, according to 
Eq. (1), calculating the predicted scores for his/her uncollected objects; (iii) 
Sorting the uncollected objects in descending order of the predicted scores, 
and those objects in the top will be recommended. 



3 Algorithmic performance metrics 

To test a recommendation method on a dataset we randomly remove 10% of 
the links as the probe set and apply the algorithm to the remainder (training 
set) to produce a recommendation list for each user. We then employ three 
different metrics, one to measure accuracy in recovery of deleted links and two 
to measure recommendation popularity and diversity. 

3.1 Average ranking score 

An accurate method will put preferable objects in higher places. The average 
ranking score is adopted to measure the accuracy, which is defined as follows. 
For an arbitrary user Ui, if the entry Ui-Oj is in the probe set (according to the 
training set, Oj is an uncollected object for ui), we measure the position of Oj 
in the ordered list. For example, if there are Lj = 10 uncollected objects for 
Ui, and oj is the 3rd from the top, we say the position of Oj is 3/10, denoted 
by Tij = 0.3. Since the probe entries are actually collected by users, a good 
algorithm is expected to give high recommendations to them, leading to small 
Tij. Therefore, the mean value of the position, (r), averaged over all the entries 
in the probe, can be used to evaluate the algorithmic accuracy: the smaller 
the average ranking score, the higher the algorithmic accuracy, and vice verse. 

3.2 Popularity and diversity 

Besides accuracy, the average degree of all recommended objects, (k), and the 
mean value of Hamming distance, S, are taken into account to measure the 
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Fig. 1. The average ranking score (r) vs. A for the algorithm. The optimal A op t, 
corresponding to the minimal (r) = 0.0998, is A op t = —0.96. When A = 0, the 
algorithm degenerates to the accuracy of the CF based on the diffusion process. 
All the data points are averaged over ten independent runs with different data-set 
divisions. 

algorithmic popularity and diversity [21]. The smaller average degree, corre- 
sponding to the less popular objects, are preferred since those lower-degree 
objects are hard to be found by users themselves. For example, suppose there 
are 10 perfect movies not yet known for user Ui, 7 of which are widely popular, 
while the other three fit a certain specific taste of U{. An algorithm recom- 
mending the 7 popular movies is very nice for Ui, but he may feel even better 
about a recommendation list containing those two unpopular movies. In ad- 
dition, the personalized recommendation algorithm should present different 
recommendations to different users according to their tastes and habits. The 
diversity can be quantified by the average Hamming distance, S = (Hij), 
where = 1 — Qij/L, L is the length of recommendation list and Qij is the 
overlapped number of objects in and u/s recommendation lists. The largest 
5 = 1 indicates the recommendations to all of the users are totally different, 
in other words, the system has highest diversity. While the smallest S = 
means that the recommendations for different users are exactly the same. 



4 Numerical results 

A benchmark dataset, namely MovieLens, is used to test the above algorithm, 
which consists of 1682 movies (objects) and 943 users. The users vote movies 
by discrete ratings from one to five. We applied a coarse-graining method: A 
movie is set to be collected by a user only if the given rating is larger than 
2. The original data contains 10 5 ratings, 85.25% of which are larger than 2, 
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Fig. 2. Average degrees of recommended objects, (k) vs. A. Squares, circles and 
triangles represent lengths L = 10, 20 and 50, respectively. All the data points are 
averaged over ten independent runs with different data-set divisions. 




Fig. 3. The diversity S vs. A. Squares, circles and triangles represent the lengths 
L = 10, 20 and 50, respectively. All the data points are averaged over ten indepen- 
dent runs with different data-set divisions. 

that is, the user-object (user-movie) bipartite network after the coarse gaining 
contains 85250 edges. 

Figure 1 reports the algorithmic accuracy as a function of A. The curve has a 
clear minimum around A = —0.96, which strongly support the above discus- 
sion that to depress the influence of the users or objects with large degrees 
could enhance the accuracy. Compared with the routine case (A = 0), the 
average ranking score can be reduced by 18.19% at the optimal case, which 
indeed a great improvement. Figure 2 reports the average degree of all recom- 
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mended movies as a function of A. When A < the average object degree is 
positively correlated with A, thus to depress the influence of edges connect- 
ing active users and popular objects gives more opportunity to the unpopular 
objects, which is consistent with our expectation. Figure 3 exhibits a nega- 
tive correlation between S and A, indicating that to depress the influence of 
the edges connecting active users and popular objects makes the recommen- 
dations more personalized. When L = 10, the diversity S is increased from 
0.661 (corresponding to the case when A = 0) to 0.806 (corresponding to the 
optimal case A = —0.96), improved by 21.90%. 



5 Conclusions and discussions 

In this paper, a modified collaborative filtering algorithm is presented to im- 
prove the algorithmic accuracy by depressing the influence of edges connecting 
active users and popular objects. The algorithmic accuracy, measured by the 
average ranking score, can be improved by 18.19%. Beside accuracy, two sig- 
nificant criteria of algorithmic performance, popularity and diversity, are also 
taken into account. A good recommendation algorithm should not only has 
higher accuracy, but also help the users uncovering the hidden information, 
corresponding to those objects with low degrees. Therefore, the average ob- 
ject degree is a meaningful measurment for a recommendation algorithm. In 
addition, a personalized recommender system should provide each user person- 
alized recommendations according to his/her interests and habits, therefore, 
the diversity of recommendations plays a crucial role to quantify the personal- 
ization. The numerical results show that the presented algorithm outperforms 
the standard CF in all three criteria, accuracy, popularity and diversity. 

Since the power computation takes much more time than multiplication, this 
algorithm would take longer time to get the user similarities. Throughout the 
numerical simulation results, we could find that the optimal A opt is close to -1. 
When A = —1, the corresponding (r) = 0.0995, which is also improved 18.07%, 
and the average object degree and diversity are getting even better, where the 
diversity S has been improved by 23.40%. Therefore, in real application, the 
parameter could be set as -1, which ensures that the algorithmic complexity 
is as same as a parameter free CF. 

How to automatically find out relevant information for diverse users is a long- 
standing challenge in the modern information science, the presented algorithm 
also could be used to find the relevant reviewers for the scientific papers or 
funding applications [22,23], and the link prediction in social and biologi- 
cal networks [24]. We believe the current work can enlighten readers in this 
promising direction. 



7 



We acknowledge GroupLens Research Group for providing us the data set. This 
work is partially supported by SBF (Switzerland) project C05.0148 (Physics 
of Risk), the National Natural Science Foundation of China (Grant Nos. 
10635040 and 60744003), the Swiss National Science Foundation (project 
205120-113842), the Specialized Research Fund for the Doctoral Program of 
Higher Education of China. (Grant No. 20060358065), and by the Research 
Fund of the Education Department of Liaoning of China (20060140). 



References 



[I] G.-Q. Zhang, G.-Q. Zhang, Q.-F. Yang, S.-Q. Cheng, T. Zhou, New Journal of 
Physics 10 (2008) 12307. 

[2] P. Resnick, H. R. Varian, Commun. ACM 40 (1997) 56. 

[3] G. Adomavicius, A. Tuzhilin, IEEE Trans. Know. & Data Eng. 17 (2005) 734. 

[4] J. B. Schafer, J. A. Konstan, J. Riedl, Data Min. & Knowl. Disc. 5 (2001) 115. 

[5] J. L. Herlocker, J. A. Konstan, K. Terveen, J. Riedl, ACM Trans. Inform. Syst. 
22 (2004) 5 . 

[6] J. A. Konstan, B. N. Miller, D. Maltz, J. L. Herlocker, L. R. Gordon, J. Riedl, 
Commun. ACM 40 (1997) 77. 

[7] J.-G. Liu, B.-H. Wang, Q. Guo, Int. J. Mod. Phys. C 20 (2009) 285. 

[8] J.-G. Liu, T. Zhou, B.-H. Wang, Y.-C. Zhang, larXiv:0808.3726l 

[9] R.-R. Liu, C.-X. Jia, T. Zhou, D.Sun, B.-H. Wang, Physica A 388 (2009) 462. 

[10] M. Balabanovic, Y. Shoham, Commun. ACM 40 (1997) 66. 

[II] M. J. Pazzani, Artif. Intell. Rev. 13 (1999) 393. 

[12] M. Pazzani, D. Billsus, Machine Learning 27 (1997) 313. 

[13] C. Basu, H. Hirsh, W. Cohen, Technical Report WS-98-08, AAAI Press (1998) 
714. 

[14] N. Good, J. B. Schafer, J. A. Konstan, A. L. Borchers, B. Sarwar, J. Herlocker, 
J. Riedl, Proc. Conf. Am. Assoc. Artif. Intell. (1999) 439. 

[15] J.-G. Liu, M.Z.-Q. Chen, J. Chen, F. Deng, H.-T. Zhang, Z. Zhang, T. Zhou, 
Int. J. Inf. and Sys. Sci. 5(2) (2009) 230. 

[16] J. B. Schafer, D. Frankowski, J. L. Herlocker, S. Sen, Lect. Notes Comput. Sci. 
4321 (2007) 291. 

[17] B. Sarwar, G. Karypis, J. A. Konstan, J. Reidl, Proc. ACM Conf. Electro. 
Commerce (2000) 158. 



8 



[18] Y.-C. Zhang, M. Medo, J. Ren, T. Zhou, T. Li, F. Yang, Europhys. Lett. 80 
(2008) 68003. 

[19] T. Zhou, J. Ren, M. Medo, Y.-C. Zhang, Phys. Rev. E 76 (2007) 046115. 

[20] Y.-C. Zhang, M. Blattner, Y.-K. Yu, Phys. Rev. Lett. 99 (2007) 154301. 

[21] T. Zhou, L.-L. Jiang, R.-Q. Su and Y.-C. Zhang, Europhys. Lett. 81 (2008) 
58004. 

[22] J.-G. Liu, Y.-Z. Dang, Z.-T. Wang, Physica A 366 (2006) 578. 

[23] J.-G. Liu, Z.-G. Xuan, Y.-Z. Dang, Q. Guo, Z.-T. Wang, Physica A 377 (2007) 
302. 

[24] T. Zhou, L. Lii, Y.-C. Zhang, larXiv:0901.0553l 



9 



